Skip to content

[GLUTEN-11302][VL] Fix gpu build by bumping to cuda-13.1#11275

Merged
zhouyuan merged 15 commits intoapache:mainfrom
zhouyuan:wip_fix_gpu_build
Dec 18, 2025
Merged

[GLUTEN-11302][VL] Fix gpu build by bumping to cuda-13.1#11275
zhouyuan merged 15 commits intoapache:mainfrom
zhouyuan:wip_fix_gpu_build

Conversation

@zhouyuan
Copy link
Member

@zhouyuan zhouyuan commented Dec 10, 2025

What changes are proposed in this pull request?

Fix GPU build by

  • switch to use gcc-14
  • bumping to cuda-toolkit-13.1

The new cuda-toolkit-13.1 requires larger disk spaces, so this patch also modified GHA to clean up the disk space firstly

How was this patch tested?

pass GHA

fixes: #11302

Signed-off-by: Yuan <yuanzhou@apache.org>
${VELOX_BUILD_PATH}/_deps/nvtx3-src/c/include
${VELOX_BUILD_PATH}/_deps/nvcomp_proprietary_binary-src/include
${VELOX_BUILD_PATH}/_deps/rapids_logger-src/include
/usr/local/cuda/include/cccl
Copy link

@bdice bdice Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, we should try to fix this by calling find_package for cudf. It should set up all these include paths. I'll try to repro this locally with @karthikeyann and make a suggestion.

We don't want to require a specific CUDA version just to get a particular CCCL version -- those don't always move in lockstep and sometimes RAPIDS requires CCCL versions that have been publicly released but are not yet shipped in a CUDA toolkit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice thanks for the inputs. Yes, I think we ran into version mismatch for rapids rmm and cuda. Initially in Gluten we prepared a docker env with cuda-12.8 pre-installed, everything works and the CMake piece was targeting for the old env. However with the recent cudf-25.12 change, It does not compile, hence I'm experimenting on how to fix this. In my local env, i will need to bump to use cuda-13.1 otherwise there will be issues on some header definition. I also tried cuda-12.9 and cuda-13.0 - does not work

/__w/incubator-gluten/incubator-gluten/dev/../ep/build-velox/build/velox_ep/_build/release/_deps/rmm-src/cpp/include/rmm/detail/cuda_memory_resource.hpp:23:49: error: 'synchronous_resource_with' is not a member of 'cuda::mr'
   23 | inline constexpr bool resource_with = cuda::mr::synchronous_resource_with<Resource, Properties...>;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #11407 as a follow-up!

Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
This reverts commit 648e2f2.
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
This reverts commit 85a2064.
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
@zhouyuan zhouyuan changed the title [VL] Fix gpu build by bumping to cuda-13.1 [GLUTEN-11302][VL] Fix gpu build by bumping to cuda-13.1 Dec 16, 2025
@zhouyuan zhouyuan marked this pull request as ready for review December 16, 2025 08:16
echo "enable GPU support."
COMPILE_OPTION="$COMPILE_OPTION -DVELOX_ENABLE_GPU=ON -DVELOX_ENABLE_CUDF=ON -DCMAKE_CUDA_ARCHITECTURES=70 \
-DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.8/bin/nvcc"
COMPILE_OPTION="$COMPILE_OPTION -DVELOX_ENABLE_GPU=ON -DVELOX_ENABLE_CUDF=ON -DCMAKE_CUDA_ARCHITECTURES=75 \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for cuda-13.1 it supports 75 at minimal

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zhouyuan zhouyuan merged commit 4e35abc into apache:main Dec 18, 2025
114 of 117 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] GPU build failure due to recent CUDF upgrade

3 participants